11 research outputs found
Belief State Planning for Autonomously Navigating Urban Intersections
Urban intersections represent a complex environment for autonomous vehicles
with many sources of uncertainty. The vehicle must plan in a stochastic
environment with potentially rapid changes in driver behavior. Providing an
efficient strategy to navigate through urban intersections is a difficult task.
This paper frames the problem of navigating unsignalized intersections as a
partially observable Markov decision process (POMDP) and solves it using a
Monte Carlo sampling method. Empirical results in simulation show that the
resulting policy outperforms a threshold-based heuristic strategy on several
relevant metrics that measure both safety and efficiency.Comment: 6 pages, 6 figures, accepted to IV201
Model Based Residual Policy Learning with Applications to Antenna Control
Non-differentiable controllers and rule-based policies are widely used for
controlling real systems such as telecommunication networks and robots.
Specifically, parameters of mobile network base station antennas can be
dynamically configured by these policies to improve users coverage and quality
of service. Motivated by the antenna tilt control problem, we introduce
Model-Based Residual Policy Learning (MBRPL), a practical reinforcement
learning (RL) method. MBRPL enhances existing policies through a model-based
approach, leading to improved sample efficiency and a decreased number of
interactions with the actual environment when compared to off-the-shelf RL
methods.To the best of our knowledge, this is the first paper that examines a
model-based approach for antenna control. Experimental results reveal that our
method delivers strong initial performance while improving sample efficiency
over previous RL methods, which is one step towards deploying these algorithms
in real networks
Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes
Autonomous systems are often required to operate in partially observable
environments. They must reliably execute a specified objective even with
incomplete information about the state of the environment. We propose a
methodology to synthesize policies that satisfy a linear temporal logic formula
in a partially observable Markov decision process (POMDP). By formulating a
planning problem, we show how to use point-based value iteration methods to
efficiently approximate the maximum probability of satisfying a desired logical
formula and compute the associated belief state policy. We demonstrate that our
method scales to large POMDP domains and provides strong bounds on the
performance of the resulting policy.Comment: 8 pages, 3 figures, AAAI 202
Towards addressing training data scarcity challenge in emerging radio access networks: a survey and framework
The future of cellular networks is contingent on artificial intelligence (AI) based automation, particularly for radio access network (RAN) operation, optimization, and troubleshooting. To achieve such zero-touch automation, a myriad of AI-based solutions are being proposed in literature to leverage AI for modeling and optimizing network behavior to achieve the zero-touch automation goal. However, to work reliably, AI based automation, requires a deluge of training data. Consequently, the success of the proposed AI solutions is limited by a fundamental challenge faced by cellular network research community: scarcity of the training data. In this paper, we present an extensive review of classic and emerging techniques to address this challenge. We first identify the common data types in RAN and their known use-cases. We then present a taxonomized survey of techniques used in literature to address training data scarcity for various data types. This is followed by a framework to address the training data scarcity. The proposed framework builds on available information and combination of techniques including interpolation, domain-knowledge based, generative adversarial neural networks, transfer learning, autoencoders, fewshot learning, simulators and testbeds. Potential new techniques to enrich scarce data in cellular networks are also proposed, such as by matrix completion theory, and domain knowledge-based techniques leveraging different types of network geometries and network parameters. In addition, an overview of state-of-the art simulators and testbeds is also presented to make readers aware of current and emerging platforms to access real data in order to overcome the data scarcity challenge. The extensive survey of training data scarcity addressing techniques combined with proposed framework to select a suitable technique for given type of data, can assist researchers and network operators in choosing the appropriate methods to overcome the data scarcity challenge in leveraging AI to radio access network automation
A Graph Attention Learning Approach to Antenna Tilt Optimization
6G will move mobile networks towards increasing levels of complexity. To deal
with this complexity, optimization of network parameters is key to ensure high
performance and timely adaptivity to dynamic network environments. The
optimization of the antenna tilt provides a practical and cost-efficient method
to improve coverage and capacity in the network. Previous methods based on
Reinforcement Learning (RL) have shown great promise for tilt optimization by
learning adaptive policies outperforming traditional tilt optimization methods.
However, most existing RL methods are based on single-cell features
representation, which fails to fully characterize the agent state, resulting in
suboptimal performance. Also, most of such methods lack scalability, due to
state-action explosion, and generalization ability. In this paper, we propose a
Graph Attention Q-learning (GAQ) algorithm for tilt optimization. GAQ relies on
a graph attention mechanism to select relevant neighbors information, improve
the agent state representation, and update the tilt control policy based on a
history of observations using a Deep Q-Network (DQN). We show that GAQ
efficiently captures important network information and outperforms standard DQN
with local information by a large margin. In addition, we demonstrate its
ability to generalize to network deployments of different sizes and densities